National Repository of Grey Literature 10 records found  Search took 0.00 seconds. 
Word2vec Models with Added Context Information
Šůstek, Martin ; Rozman, Jaroslav (referee) ; Zbořil, František (advisor)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.
Fast Adaptation of Codenames Computer Assistant for New Languages
Jareš, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis extends a system of an artificial player of a word-association game Codenames to easy addition of support for new languages. The system is able to play Codenames in roles as a guessing player, a clue giver or, by their combination a Duet version player. For analysis of different languages a neural toolkit Stanza was used, which is language independent and enables automated processing of many languages. It was mainly about lemmatization and part of speech tagging for selection of clues in the game. For evaluation of word associations were several models tested, where the best results had a method Pointwise Mutual Information and predictive model fastText. The system supports playing Codenames in 36 languages comprising 8 different alphabets.
Neural Network Based Named Entity Recognition
Straková, Jana ; Hajič, Jan (advisor) ; Černocký, Jan (referee) ; Konopík, Miloslav (referee)
Title: Neural Network Based Named Entity Recognition Author: Jana Straková Institute: Institute of Formal and Applied Linguistics Supervisor of the doctoral thesis: prof. RNDr. Jan Hajič, Dr., Institute of Formal and Applied Linguistics Abstract: Czech named entity recognition (the task of automatic identification and classification of proper names in text, such as names of people, locations and organizations) has become a well-established field since the publication of the Czech Named Entity Corpus (CNEC). This doctoral thesis presents the author's research of named entity recognition, mainly in the Czech language. It presents work and research carried out during CNEC publication and its evaluation. It fur- ther envelops the author's research results, which improved Czech state-of-the-art results in named entity recognition in recent years, with special focus on artificial neural network based solutions. Starting with a simple feed-forward neural net- work with softmax output layer, with a standard set of classification features for the task, the thesis presents methodology and results, which were later used in open-source software solution for named entity recognition, NameTag. The thesis finalizes with a recurrent neural network based recognizer with word embeddings and character-level word embeddings,...
Fast Adaptation of Codenames Computer Assistant for New Languages
Jareš, Petr ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis extends a system of an artificial player of a word-association game Codenames to easy addition of support for new languages. The system is able to play Codenames in roles as a guessing player, a clue giver or, by their combination a Duet version player. For analysis of different languages a neural toolkit Stanza was used, which is language independent and enables automated processing of many languages. It was mainly about lemmatization and part of speech tagging for selection of clues in the game. For evaluation of word associations were several models tested, where the best results had a method Pointwise Mutual Information and predictive model fastText. The system supports playing Codenames in 36 languages comprising 8 different alphabets.
Analýza textových používateľských hodnotení vybranej skupiny produktov
Valovič, Roman
This work focuses on the design of a system that identifies frequently discussed product features in product reviews, summarizes them, and displays them to the user in terms of sentiment. The work deals with the issue of natural language processing, with a specific focus on Czech languague. The reader will be introduced the methods of preprocessing the text and their impact on the quality of the analysis results. The identification of the mainly discussed products features is carried out by cluster analysis using the K-Means algorithm, where we assume that sufficiently internally homogeneous clusters will represent the individual features of the products. A new area that will be explored in this work is the representation of documents using the Word embeddings technique, and its potential of using vector space as input for machine learning algorithms.
Codenames: a practical application for modelling word association
de Rijk, Micha Theo Neri ; Mareček, David (advisor) ; Popel, Martin (referee)
Micha de Rijk January 6, 2020 Word association is an important part of human language. Many techniques for capturing semantic relations between words exist, but their ability to model word associations is rarely tested. We introduce the game of Codenames with one human player as a word association task to evaluate how well a language model captures this information. We establish the baseline f-score of 0.362 and explore the performance of several collocations and word embedding models on this task. Our best model uses fastText word embeddings and achieves an f-score of 0.789 for Czech and 0.751 for English. 1
Genres classification by means of machine learning
Bílek, Jan ; Neruda, Roman (advisor) ; Vomlelová, Marta (referee)
In this thesis, we compare the bag of words approach with doc2vec doc- ument embeddings on the task of classification of book genres. We cre- ate 3 datasets with different text lengths by extracting short snippets from books in Project Gutenberg repository. Each dataset comprises of more than 200000 documents and 14 different genres. For 3200-character documents, we achieve F1-score of 0.862 when stacking models trained on both bag of words and doc2vec representations. We also explore the relationships be- tween documents, genres and words using similarity metrics on their vector representations and report typical words for each genre. As part of the thesis, we also present an online webapp for book genre classification. 1
Analysis of stock market sentiment with social media
Čermák, Vojtěch ; Baruník, Jozef (advisor) ; Vacek, Pavel (referee)
In the thesis, we explored prospects of extracting sentiment contained in Twitter messages. We proposed novel approach consisting of directly predicting the volatility on stock market by features obtained from the text documents using suitable document representation. We compared the performance of standard document vectorisation methods as well as a novel approach based on aggregating word vectors created by word embeddings. We showed that direct modelling of a market variable is possible with most of the proposed vectorisation techniques. In particular, the strong predictive power of aggregated word embeddings suggests that they are excellent sentiment representation, because they are independent of message volume and they capture well the semantical information in the tweets. Besides, our findings suggest that aggregating word embeddings vectorisation is viable approach even for large documents.
Neural Network Based Named Entity Recognition
Straková, Jana ; Hajič, Jan (advisor) ; Černocký, Jan (referee) ; Konopík, Miloslav (referee)
Title: Neural Network Based Named Entity Recognition Author: Jana Straková Institute: Institute of Formal and Applied Linguistics Supervisor of the doctoral thesis: prof. RNDr. Jan Hajič, Dr., Institute of Formal and Applied Linguistics Abstract: Czech named entity recognition (the task of automatic identification and classification of proper names in text, such as names of people, locations and organizations) has become a well-established field since the publication of the Czech Named Entity Corpus (CNEC). This doctoral thesis presents the author's research of named entity recognition, mainly in the Czech language. It presents work and research carried out during CNEC publication and its evaluation. It fur- ther envelops the author's research results, which improved Czech state-of-the-art results in named entity recognition in recent years, with special focus on artificial neural network based solutions. Starting with a simple feed-forward neural net- work with softmax output layer, with a standard set of classification features for the task, the thesis presents methodology and results, which were later used in open-source software solution for named entity recognition, NameTag. The thesis finalizes with a recurrent neural network based recognizer with word embeddings and character-level word embeddings,...
Word2vec Models with Added Context Information
Šůstek, Martin ; Rozman, Jaroslav (referee) ; Zbořil, František (advisor)
This thesis is concerned with the explanation of the word2vec models. Even though word2vec was introduced recently (2013), many researchers have already tried to extend, understand or at least use the model because it provides surprisingly rich semantic information. This information is encoded in N-dim vector representation and can be recall by performing some operations over the algebra. As an addition, I suggest a model modifications in order to obtain different word representation. To achieve that, I use public picture datasets. This thesis also includes parts dedicated to word2vec extension based on convolution neural network.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.